Long-term community stability: accounting for detection error to understand the contributions of current and past environmental conditions to community states

Collaborator update - November 2023

Motivation

In this study, we’re interested in understanding how long-term datasets can help us understand how communities respond to environmental change. We’re also interested in accounting for detection error in this process - a practice that is well established in some sub-fields of ecology (e.g., wildlife ecology, mostly birds) but less so in others (e.g., community ecology, especially with non-bird focus). Our aim is to combine a two-part modeling process to first account for detection error in community datasets, and then use these datasets to derive standard values of community change/stability (e.g. dissimilarlity) that can then be used in a second model in which we ask how the environment (both current and past) shapes community change.

Guiding questions

Q1: How are estimates of community stability shaped by detection error?

Q2: How is community stability through time shaped by environmental factors?

Approach

In our two-part modeling framework, we are first inputting raw survey data from a community along with covariates to detection into a multi-species occupancy or abuandance model. Then, we are deriving mean values of community change (‘beta diversity’) along with uncertainty in these values (standard deviation). Finally, we are inputting these values into a regression model that incorporates current and lagged effects of environmental covariates via the stochastic antecedent modeling (SAM) framework to understand how the environment (both biotic and abiotic) shapes community change/stability.

Figure 1: Depiction of the modeling process, in which yellow circles depict data (inputs and outputs) and blue boxes represent models.

Datasets

Our aim is to demonstrate the general utility of the modeling process as a way to unify community ecology sub-disciplines that have historical differences in how they analyze data. Thus, we have compiled a set of four datasets across taxonomic groups and environments to highlight the general utility of our approach.

Santa Barbara Channel LTER fish community

Our first dataset is from surveys of fish in the Santa Barbara Channel LTER (SBC LTER) that span from 2000-2022.

  • Taxa: Fish
  • Environment: Marine - Kelp forest
  • Years: 23
  • Number of sites: 43
  • Number of species: 63
  • Data Type: Abundance
  • Detection covariates: fish size and dive visibility
  • Environmental covariates: seasonal temperature and annual giant kelp biomass

Konza Prairie LTER bird community

The second dataset is from surveys of grassland birds in the Konza Prairie LTER (KNZ LTER) that span from 1981-2009.

  • Taxa: Birds
  • Environment: Terrestrial - Tallgrass prairie
  • Years: 28
  • Number of sites: 3-11 (still WIP)
  • Number of species: TBD
  • Data Type: Abundance
  • Detection covariates: bird size and survey length
  • Environmental covariates: seasonal temperature, precipitation, (and potentially annual plant biomass)

Sevilleta LTER grasshopper community

The third dataset is from surveys of grasshoppers in the Sevilleta LTER (SEV LTER) that span from 1992-2019.

  • Taxa: Insects
  • Environment: Terrestrial - Blue gramma grassland and Creosote shrubland
  • Years: 27
  • Number of sites: 60
  • Number of species: 46
  • Data Type: Abundance
  • Detection covariates: none (none provided in metadata, others not easy to derive (e.g. body size) from literature review)
  • Environmental covariates: seasonal temperature and plant biomass

Petrified Forest National Park plant community

The final dataset is a set of surveys of understory plant communities in Petrified Forest National Park (PFNP) that span from 2007-2022.

  • Taxa: Plants
  • Environment: Terrestrial - Grassland and shrubland
  • Years: 15
  • Number of sites: 10 (subset because of computation time)
  • Number of species: TBD
  • Data Type: Presence-Absence (Detection/Nondetection)
  • Detection covariates: cover class
  • Environmental covariates: seasonal precipitation and VPD

Progress so far

SBC LTER fish dataset

  • Detection model: complete
  • SAM regression model: complete

KNZ LTER bird dataset

  • Detection model: WIP
  • SAM regression model: not run

SEV LTER grasshopper dataset

  • Detection model: complete
  • SAM regression model: complete

PFNP plant dataset

  • Detection model: WIP
  • SAM regression model: not run

Potential figures and follow-up analyses

Following are a set of potential figures we could consider including the paper (or supplements) along with follow-up analyses associated with these.

Q1: Detection error

Covariates driving detection

When thinking about detection, we could look at how covariates we input into the model influence our ability to detect species:

Figure 2: Detection of fish in the SBC LTER is shaped by fish body size and dive visibility. Smaller fish are easier to detect (likely related to school sizes) and detection is higher during dives with greater visibility.

Note: There were no covariates for detection easily available for the grasshopper dataset, so there is no panel of this potential figure for that dataset.

Accounting for detection error change and estimates of change

We could also look at how observed versus modeled estimates of dissimilarity compare:

Figure 3: Dissimilarity (either Bray-Curtis or Jaccard, depending on data type) computed with the modeled data (yellow) and observed data (blue) on species abundances or presences. Values closer to one correspond to more community change. For both the SEV LTER grasshopper and SBC LTER fish datasets, accounting for detection error (“modeled” data) reduces variance in estimates and reduces the mean estimate of dissimilarity (less change).

In this case, we could perform a post-hoc analysis of whether estimates of dissimilarity change with data type (observed versus modeled) and whether this varies by dataset. This could be a quick frequentist model (glmm with random effects of site-year as a “repeated measures” and covariates of data type, dataset ID, and their interaction) or could be a follow-up Bayesian model.

We could also explore different ways of thinking about what “observed” data are. Right now, I’m taking the maximum number of individuals observed per species in each site-year combo as the “observed” value, but this assumes that observers are going out more than once each year to each site to survey. This is often not the case, and often we only go out and survey once and call it good. So this figure/analysis could include another way of thinking about “observed” data - by taking the observation within each dataset for each site-year combo with the maximum number of individuals/species observed (assuming observers are going out at the time that maximizes detection in their system) rather than summarizing as I have done. Likely, this method will result in what I would expect would be an even larger spread within the observed data than what you see here.

Qualitative assessment of detection probabilities

Depending on whether datasets greatly vary in their observed versus modeled estimates, we could do an qualitative assessment of whether this is related to relative “rarity” in the dataset, based on the distribution of detection probabilities for each species in the dataset:

Figure 5

Q2: Environmental drivers

For regression models, we could illustrate the effects of all covariates in the model:

All covariate effects

Figure 6: Covariate effects for the SAM regression models. Positive effects mean that higher values of those covariates lead to more community change.

Significant covariate effects and importance weights

As well as demonstrate how important variables shape community change as well as whether these effects are relatively instantaneous or lagged:

Figure 7: Partial plots of important covariates in each SAM model (temperature and plant biomass for SEV LTER grasshoppers; temperature for SBC LTER fish). The first panel of each effect shows the relationship and the second panel shows the importance weights of each season in each year into the past to that effect.

Figure feedback

We are limited to 6 figures and tables in the paper, and many of these figures could be multi-panel to highlight all four datasets. Do you all have thoughts on which figures (or multi-panel figures) you think belong in the main text versus in the supplement? I have some thoughts, but would love your feedback as well.

Next steps

Shelby and I are currently working on getting the final datasets through this workflow.

An and I are drafting the manuscript to circulate.

Timeline

We have a tight timeline coming up in the next couple of months, so thank you all in advance for any and all contributions! Here is the timeline we are aiming for for important checkpoints along the way:

  • Mid-November: Draft of manuscript sent to all collaborators
  • November 30: Draft back to myself and An to incorporate edits
  • December 15: Draft back out to co-authors for second review
  • December 30: Draft back to myself and An to incorporate edits
  • January 15: Submission deadline

Thanks!